An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms

نویسندگان

  • Seung-Jin Sul
  • Tiffani L. Williams
چکیده

In this paper, we study two fast algorithms—HashRF and PGM-Hashed—for computing the Robinson-Foulds (RF) distance matrix between a collection of evolutionary trees. The RF distance matrix represents a tremendous data-mining opportunity for helping biologists understand the evolutionary relationships depicted among their trees. The novelty of our work results from using a variety of different architectureand implementation-independent measures (i.e., percentage of bipartition sharing, number of bipartition comparisons, and memory usage) in addition to CPU time to explore practical algorithmic performance. Overall, our study concludes that HashRF performs better across the various performance measures than its competitor, PGM-Hashed. Thus, the HashRF algorithm provides scientists with a fast approach for understanding the evolutionary relationships among a set of trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal algorithms for computing the Robinson and Foulds topologic distance between two trees and the strict consensus trees of k trees given their distance matrices

It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered and called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise) sca...

متن کامل

Fast Hashing Algorithms to Summarize Large Collections of Evolutionary Trees

Different phylogenetic methods often yield different inferred trees for the same set of organisms. Moreover, a single phylogenetic approach (such as a Bayesian analysis) can produce many trees. Consensus trees and topological distance matrices are often used to summarize the evolutionary relationships among the trees of interest. These summarization techniques are implemented in current phyloge...

متن کامل

Algorithms for Computing Cluster Dissimilarity between Rooted Phyloge- netic Trees

Phylogenetic trees represent the historical evolutionary relationships between different species or organisms. Creating and maintaining a repository of phylogenetic trees is one of the major objectives of molecular evolution studies. One way of mining phylogenetic information databases would be to compare the trees by using a tree comparison measure. Presented here are a new dissimilarity measu...

متن کامل

Comparison of Additive Trees Using Circular Orders

It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered which is called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise...

متن کامل

A Randomized Algorithm for Comparing Sets of Phylogenetic Trees

Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true” tree. Post-processing techniques such as strict consensus trees are widely used to summarize the evolutionary relationships into a single tree. However, valuable information is lost during the summarization process. A more elementary step is produce estimates of the topological di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008